REDUCING THE BATCH AND PLATFORM EFFECTS IN TRANCRIPTOME DATA ANALYSIS

Sudesh Pundir, Northwestern University

High throughput technologies, such as microarrays and massive parallel sequencing, have brought new challenges for gene expression or transcriptome data analysis. While numerous datasets for both disease and normal tissues (or cells) are publicly available, traditional statistical methods usually fail in integrating and reproducing the results performed across different datasets. Two major issues; platform (microarray and sequencing) and batch differences; if not accounted during the statistical analysis might lead to misleading results. In addition, the high dimensionality, complexity and sparsity of data pose additional problems for data integration. We will present an overview of the current statistical methods, comparative evaluation of those methods on two The Cancer Genome Atlas transcriptome datasets, and discuss the need for development of new statistical methods.